-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Runtime][WIP] Add prototype Relay AoT compiler directly into TVM #6219
Conversation
Thanks @slyubomirsky . Some highlevel comments. First of all, AOT should not be part of the runtime. Runtime contains the minimum set of things we need to execute the program. Most of the AOT logic are actually part of target translation phase. Taking the current organization of relay into account, they should be moved to Of course there are additional runtime features and wrappers needed to run the program compiled from AOT. From the interface point of view, we should remove these wrapper code and completely rely on the PackedFunc and runtime.Module interface. So the AOT compilation should take in an IRModule and output a runtime.Module, which contains the functions necessary to run the generated program. Ideally the runtime.Module should contain similar interface with other compiled programs, such as the vm and the graph runtime |
Thank you for the suggestions! I will move the files to |
Hm, if the CI fails because edit: I guess I can use |
if lib_path is None: | ||
lib_path = os.curdir | ||
|
||
debug_source_path = os.path.join(lib_path, 'source.cc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put this functionality behind a debug flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
with open(debug_source_path, 'w') as source_file: | ||
source_file.write(source) | ||
|
||
# with tempfile.TmporaryDirectory() as tmpdir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
@slyubomirsky Thanks for the work! -- Some very high-level comments. IIUC, this is bypassing the TIR lowering as it stands today. Thus, possibly losing the benefits TIR scheduling based optimizations in AOT compilation path. I just wanted to know what the roadmap looks like if we are to re-target the AOT codegen at the TIR level. Would it be an incremental work on top of this ? or would it require a complete re-write ? |
@manupa-arm the primitive function is still lowered to TIR - we only compile Relay fragment to C++. This is in accordance to how Relay had work for Interpreter/VM - Relay fragment get handle separately where primtive function get lowered to TIR and handled by TVM. If you are talking about lowering everything to TIR, the biggest problem is the design implication. It will make TIR less like fortran and way more like SML. |
@MarisaKirisame , thanks for the clarification! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, see if you agree.
return res if not record_time else (res, end - begin) | ||
return _wrapper | ||
|
||
def compile_prog(func, mod, ctx, tgt, name='default', record_time=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to make the func the "main" inside mod ? So that we only needs to pass the mod with a "main".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was definitely a flaw in the research prototype that we never took the time to correct. I agree that using the main function would be a much more sensible convention
f"-L{TVM_PATH}/build" | ||
] | ||
|
||
if system == 'Darwin': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler config could be a class variable -- a dictionary to be precise.
Maybe look that up for the flags and refactor the rest as they are mostly same.
Maybe a comment explaining the special cased flags for "Darwin" could be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this should be written in a more maintainable manner. I will see if it can be made to programmatically match up with TVM's own C++ build configuration, for example
_LIB_COUNTER += 1 | ||
return lib_name, packed_name | ||
|
||
def _mk_wrapper(func, ctx, constants, record_time): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapper seems to enable runtime perf. measurement; Maybe add a comment describing the functionality;
Also it would be better make record_time, default to False.
[Suggestion] make this a decorator and remove the wrapper from the implemetation and use the decorator where the performance measurement is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may remove the time recording feature entirely for now, as it was something we included ad hoc for a single experiment
func = compiler.visit(func) | ||
lib_name, packed_name = lib_and_func_name(name) | ||
constants, source_code = to_source.to_source(func, compiler.gv_map, ctx, packed_name) | ||
lib_name = f"librelay_aot_{_LIB_COUNTER}.so" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for user to optionally give the paths for the artifacts : .so and .cc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a good option, I will include that in my next revision.
[Clarification Question] How are the memory allocated for tensors -- in-between primitive functions -- ? Please point me to the code if its there -- It seems I have missed that. Do you do storage_id usage optimizations such as done in graph plan memory ? |
|
||
must_run_process(["clang-format", "-i", debug_source_path]) | ||
|
||
system = os.uname()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this an argument to the compile cpp_function?
In future, it would be easier to enable cross-compilation.
There are not, to my knowledge (@MarisaKirisame wrote most of the compiler), any memory planning optimizations in the AoT prototype, though it would certainly be a good addition. I never specifically looked into the memory allocation behavior (it was an area we ignored in the prototype altogether), but I believe allocations simply happen when the NDArray constructor is called in the generated code -- I will check that. |
I am doing triage on old PRs, going to close this, please feel free to follow up if you would like to still merge these changes. Thanks for your contributions!. |
In reference to this RFC, this PR is intended to incorporate the existing external Relay ahead-of-time (AoT) compiler, which was primarily written by @MarisaKirisame, into TVM. To start, I am simply including most of the files from the AoT compiler repo nearly verbatim, though the interfaces should be changed to better adhere to the high-level vision for TVM (especially since the initial code comes from a research prototype).
The prototype AoT compiler operates by translating Relay ASTs directly into C++ code and using TVM's JIT compiler to register all primitive functions (i.e., the C++ code calls into TVM's operator cache to handle operators). This results in producing a C++ file and requires calling the system's C++ compiler (in the prototype, assuming it to be
clang
).I would be curious to hear others' thoughts (e.g., @jroesch @weberlo @tqchen) about how this compiler can be better integrated into TVM's systems. Based on the discussion in the RFC, it sounds that the interface should be made to take an IRModule and produce a runtime module that can call the compiled functions. Ideally the system could be made modular to allow for target languages other than C++